{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "问题描述  \n",
    "以课上给出的代码为基础，通过适当的改造，修改初始化⽅式，增加正则化，调整神经元个数，增加隐层等，将这\n",
    "个模型的验证集validation准确率提⾼到98%以上。\n",
    "  \n",
    "解题提示  \n",
    "https://www.tinymind.com/ai100/notebooks/74  \n",
    "给出代码的运⾏log截图并提供⼼得体会⽂档解释对模型的各种修改起了什么样的作⽤。  \n",
    "  \n",
    "批改标准  \n",
    "代码不作为评判标准，如果运⾏正确，则认为代码没有错误。  \n",
    "没有明显报错的正常的log输出 ，log中的模型准确率达到98`分。  \n",
    "如何修改隐层数量，修改后会起到什么样的效果10分。  \n",
    "如何神经元个数，起到了什么样的效果10分。  \n",
    "如何在模型中添加L1/L2正则化，正则化起什么作⽤10分。  \n",
    "使⽤不同的初始化⽅式对模型有什么影响10分。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\python\\framework\\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
      "C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorboard\\compat\\tensorflow_stub\\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From <ipython-input-1-134662611853>:11: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n",
      "WARNING:tensorflow:From C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please write your own downloading logic.\n",
      "WARNING:tensorflow:From C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.data to implement this functionality.\n",
      "Extracting ./train-images-idx3-ubyte.gz\n",
      "WARNING:tensorflow:From C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.data to implement this functionality.\n",
      "Extracting ./train-labels-idx1-ubyte.gz\n",
      "WARNING:tensorflow:From C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.one_hot on tensors.\n",
      "Extracting ./t10k-images-idx3-ubyte.gz\n",
      "Extracting ./t10k-labels-idx1-ubyte.gz\n",
      "WARNING:tensorflow:From C:\\Users\\ilove\\Anaconda3\\envs\\tfl1.14\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "from matplotlib import pyplot as plt\n",
    "import cv2 as cv\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "tf.logging.set_verbosity(tf.logging.INFO)\n",
    "\n",
    "mnist = input_data.read_data_sets(\"./\", one_hot=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From <ipython-input-2-77df0a3788c2>:17: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n",
      "WARNING:tensorflow:From <ipython-input-2-77df0a3788c2>:67: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.math.argmax` instead\n",
      "after 100 training steps, the loss is 0.568072, the validation accuracy is 0.8894\n",
      "after 200 training steps, the loss is 0.393977, the validation accuracy is 0.933\n",
      "after 300 training steps, the loss is 0.269795, the validation accuracy is 0.9384\n",
      "after 400 training steps, the loss is 0.346258, the validation accuracy is 0.9448\n",
      "after 500 training steps, the loss is 0.325909, the validation accuracy is 0.9504\n",
      "after 600 training steps, the loss is 0.186444, the validation accuracy is 0.9596\n",
      "after 700 training steps, the loss is 0.1451, the validation accuracy is 0.9574\n",
      "after 800 training steps, the loss is 0.113851, the validation accuracy is 0.9632\n",
      "after 900 training steps, the loss is 0.178299, the validation accuracy is 0.9662\n",
      "after 1000 training steps, the loss is 0.16224, the validation accuracy is 0.9672\n",
      "after 1100 training steps, the loss is 0.136767, the validation accuracy is 0.9684\n",
      "after 1200 training steps, the loss is 0.282477, the validation accuracy is 0.9648\n",
      "after 1300 training steps, the loss is 0.168275, the validation accuracy is 0.9698\n",
      "after 1400 training steps, the loss is 0.0863475, the validation accuracy is 0.969\n",
      "after 1500 training steps, the loss is 0.0564224, the validation accuracy is 0.974\n",
      "after 1600 training steps, the loss is 0.123642, the validation accuracy is 0.9658\n",
      "after 1700 training steps, the loss is 0.202094, the validation accuracy is 0.9736\n",
      "after 1800 training steps, the loss is 0.189724, the validation accuracy is 0.974\n",
      "after 1900 training steps, the loss is 0.0885059, the validation accuracy is 0.9758\n",
      "after 2000 training steps, the loss is 0.0682991, the validation accuracy is 0.9722\n",
      "after 2100 training steps, the loss is 0.11944, the validation accuracy is 0.9754\n",
      "after 2200 training steps, the loss is 0.140425, the validation accuracy is 0.971\n",
      "after 2300 training steps, the loss is 0.122413, the validation accuracy is 0.976\n",
      "after 2400 training steps, the loss is 0.102095, the validation accuracy is 0.9772\n",
      "after 2500 training steps, the loss is 0.0772645, the validation accuracy is 0.9756\n",
      "after 2600 training steps, the loss is 0.0495494, the validation accuracy is 0.9756\n",
      "after 2700 training steps, the loss is 0.119533, the validation accuracy is 0.9766\n",
      "after 2800 training steps, the loss is 0.0529126, the validation accuracy is 0.9762\n",
      "after 2900 training steps, the loss is 0.0535589, the validation accuracy is 0.9738\n",
      "after 3000 training steps, the loss is 0.0341745, the validation accuracy is 0.9756\n",
      "after 3100 training steps, the loss is 0.0702855, the validation accuracy is 0.9774\n",
      "after 3200 training steps, the loss is 0.183855, the validation accuracy is 0.9766\n",
      "after 3300 training steps, the loss is 0.0497745, the validation accuracy is 0.9774\n",
      "after 3400 training steps, the loss is 0.109724, the validation accuracy is 0.9762\n",
      "after 3500 training steps, the loss is 0.0602179, the validation accuracy is 0.9756\n",
      "after 3600 training steps, the loss is 0.0317644, the validation accuracy is 0.9758\n",
      "after 3700 training steps, the loss is 0.0811979, the validation accuracy is 0.978\n",
      "after 3800 training steps, the loss is 0.128973, the validation accuracy is 0.977\n",
      "after 3900 training steps, the loss is 0.0360632, the validation accuracy is 0.9792\n",
      "after 4000 training steps, the loss is 0.0101847, the validation accuracy is 0.9766\n",
      "after 4100 training steps, the loss is 0.143637, the validation accuracy is 0.9786\n",
      "after 4200 training steps, the loss is 0.154898, the validation accuracy is 0.9764\n",
      "after 4300 training steps, the loss is 0.0969512, the validation accuracy is 0.9798\n",
      "after 4400 training steps, the loss is 0.104048, the validation accuracy is 0.9778\n",
      "after 4500 training steps, the loss is 0.0441593, the validation accuracy is 0.977\n",
      "after 4600 training steps, the loss is 0.102842, the validation accuracy is 0.9768\n",
      "after 4700 training steps, the loss is 0.0407263, the validation accuracy is 0.978\n",
      "after 4800 training steps, the loss is 0.0449675, the validation accuracy is 0.9806\n",
      "after 4900 training steps, the loss is 0.144594, the validation accuracy is 0.9782\n",
      "after 5000 training steps, the loss is 0.0577215, the validation accuracy is 0.9804\n",
      "after 5100 training steps, the loss is 0.0280402, the validation accuracy is 0.9802\n",
      "after 5200 training steps, the loss is 0.0387942, the validation accuracy is 0.9838\n",
      "after 5300 training steps, the loss is 0.0742095, the validation accuracy is 0.9812\n",
      "after 5400 training steps, the loss is 0.0713996, the validation accuracy is 0.9826\n",
      "after 5500 training steps, the loss is 0.0196167, the validation accuracy is 0.9832\n",
      "after 5600 training steps, the loss is 0.0452627, the validation accuracy is 0.9842\n",
      "after 5700 training steps, the loss is 0.00394212, the validation accuracy is 0.9832\n",
      "after 5800 training steps, the loss is 0.00803489, the validation accuracy is 0.9814\n",
      "after 5900 training steps, the loss is 0.0347539, the validation accuracy is 0.9826\n",
      "after 6000 training steps, the loss is 0.0168561, the validation accuracy is 0.9828\n",
      "after 6100 training steps, the loss is 0.0375034, the validation accuracy is 0.982\n",
      "after 6200 training steps, the loss is 0.0307775, the validation accuracy is 0.9854\n",
      "after 6300 training steps, the loss is 0.0508355, the validation accuracy is 0.9826\n",
      "after 6400 training steps, the loss is 0.00139895, the validation accuracy is 0.983\n",
      "after 6500 training steps, the loss is 0.0452955, the validation accuracy is 0.9834\n",
      "after 6600 training steps, the loss is 0.0215575, the validation accuracy is 0.9844\n",
      "after 6700 training steps, the loss is 0.00518003, the validation accuracy is 0.9842\n",
      "after 6800 training steps, the loss is 0.0464238, the validation accuracy is 0.9844\n",
      "after 6900 training steps, the loss is 0.00315142, the validation accuracy is 0.9834\n",
      "after 7000 training steps, the loss is 0.0508644, the validation accuracy is 0.9832\n",
      "after 7100 training steps, the loss is 0.0878609, the validation accuracy is 0.9832\n",
      "after 7200 training steps, the loss is 0.0061207, the validation accuracy is 0.9834\n",
      "after 7300 training steps, the loss is 0.0229486, the validation accuracy is 0.9828\n",
      "after 7400 training steps, the loss is 0.0320185, the validation accuracy is 0.9822\n",
      "after 7500 training steps, the loss is 0.0118636, the validation accuracy is 0.9828\n",
      "after 7600 training steps, the loss is 0.0108717, the validation accuracy is 0.9834\n",
      "after 7700 training steps, the loss is 0.0266831, the validation accuracy is 0.9826\n",
      "after 7800 training steps, the loss is 0.0482951, the validation accuracy is 0.9838\n",
      "after 7900 training steps, the loss is 0.00800201, the validation accuracy is 0.9844\n",
      "after 8000 training steps, the loss is 0.00597741, the validation accuracy is 0.984\n",
      "after 8100 training steps, the loss is 0.0328761, the validation accuracy is 0.9846\n",
      "after 8200 training steps, the loss is 0.0269113, the validation accuracy is 0.9842\n",
      "after 8300 training steps, the loss is 0.00564484, the validation accuracy is 0.9842\n",
      "after 8400 training steps, the loss is 0.0329559, the validation accuracy is 0.9842\n",
      "after 8500 training steps, the loss is 0.0152452, the validation accuracy is 0.9834\n",
      "after 8600 training steps, the loss is 0.00832337, the validation accuracy is 0.984\n",
      "after 8700 training steps, the loss is 0.0105193, the validation accuracy is 0.9848\n",
      "after 8800 training steps, the loss is 0.0184166, the validation accuracy is 0.9844\n",
      "after 8900 training steps, the loss is 0.021189, the validation accuracy is 0.9844\n",
      "after 9000 training steps, the loss is 0.00247613, the validation accuracy is 0.984\n",
      "after 9100 training steps, the loss is 0.00995113, the validation accuracy is 0.9834\n",
      "after 9200 training steps, the loss is 0.0245819, the validation accuracy is 0.9852\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "after 9300 training steps, the loss is 0.00428083, the validation accuracy is 0.984\n",
      "after 9400 training steps, the loss is 0.00211591, the validation accuracy is 0.9844\n",
      "after 9500 training steps, the loss is 0.0121074, the validation accuracy is 0.984\n",
      "after 9600 training steps, the loss is 0.0220274, the validation accuracy is 0.9852\n",
      "after 9700 training steps, the loss is 0.000920179, the validation accuracy is 0.9846\n",
      "after 9800 training steps, the loss is 0.0108772, the validation accuracy is 0.9834\n",
      "after 9900 training steps, the loss is 0.0153204, the validation accuracy is 0.9836\n",
      "training is finished!\n",
      "the test accuracy is  0.9847\n"
     ]
    }
   ],
   "source": [
    "x = tf.placeholder(tf.float32, [None, 784], name=\"x\")\n",
    "y = tf.placeholder(tf.float32, [None, 10], name=\"y\")\n",
    "learning_rate = tf.placeholder(tf.float32)\n",
    "keep_prob = tf.placeholder(tf.float32)\n",
    "\n",
    "def initialize(shape, stddev=0.12):  # 老师示范的初始化方式\n",
    "    return tf.truncated_normal(shape, stddev=stddev) # Normal Distribution \n",
    "\n",
    "# 第一层隐层的神经元个数,修改神经元个数：100 --> 450\n",
    "L1_units_count = 450\n",
    "\n",
    "W_1 = tf.get_variable('W_1', shape=[784, L1_units_count], initializer=tf.contrib.layers.variance_scaling_initializer()) # 使用 MSRA 初始化\n",
    "b_1 = tf.get_variable('b_1', shape=[L1_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "\n",
    "logits_1 = tf.matmul(x, W_1) + b_1\n",
    "output_1 = tf.nn.relu(logits_1)\n",
    "output_1 = tf.nn.dropout(output_1, keep_prob) # dropout \n",
    "\n",
    "# 第二隐层\n",
    "L2_units_count = 300\n",
    "\n",
    "W_2 = tf.get_variable('W_2', shape=[L1_units_count, L2_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "b_2 = tf.get_variable('b_2', shape=[L2_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "\n",
    "logits_2 = tf.matmul(output_1, W_2) + b_2\n",
    "output_2 = tf.nn.relu(logits_2)\n",
    "output_2 = tf.nn.dropout(output_2, keep_prob)\n",
    "\n",
    "# 第三隐层\n",
    "L3_units_count = 200\n",
    "\n",
    "W_3 = tf.get_variable('W_3', shape=[L2_units_count, L3_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "b_3 = tf.get_variable('b_3', shape=[L3_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "\n",
    "logits_3 = tf.matmul(output_2, W_3) + b_3\n",
    "output_3 = tf.nn.relu(logits_3)\n",
    "output_3 = tf.nn.dropout(output_3, keep_prob)\n",
    "\n",
    "# 第四隐层\n",
    "L4_units_count = 100\n",
    "\n",
    "W_4 = tf.get_variable('W_4', shape=[L3_units_count, L4_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "b_4 = tf.get_variable('b_4', shape=[L4_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "\n",
    "logits_4 = tf.matmul(output_3, W_4) + b_4\n",
    "output_4 = tf.nn.relu(logits_4)\n",
    "output_4 = tf.nn.dropout(output_4, keep_prob)\n",
    "\n",
    "# 输出层: 沒有 dropout\n",
    "L5_units_count = 10\n",
    "\n",
    "W_5 = tf.get_variable('W_5', shape=[L4_units_count, L5_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "b_5 = tf.get_variable('b_5', shape=[L5_units_count], initializer=tf.contrib.layers.variance_scaling_initializer())\n",
    "\n",
    "logits_5 = tf.matmul(output_4, W_5) + b_5\n",
    "logits = logits_5  # 先保持未激活\n",
    "\n",
    "cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y))\n",
    "\n",
    "# 加 L2 正则项\n",
    "l2_loss = tf.losses.get_regularization_loss()\n",
    "\n",
    "total_loss = cross_entropy_loss + 7e-5 * l2_loss # 7e-5 是网络上的范例\n",
    "optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(total_loss)\n",
    "\n",
    "pred = tf.nn.softmax(logits)\n",
    "correct_pred = tf.equal(tf.arg_max(pred, 1), tf.arg_max(y, 1))\n",
    "accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n",
    "\n",
    "batchsize = 100\n",
    "training_step = 10000\n",
    "lr = 0.3\n",
    "with tf.Session() as sess:\n",
    "    sess.run(tf.global_variables_initializer())\n",
    "    \n",
    "    # 定义验证集与测试集\n",
    "    validate_data = {\n",
    "        x: mnist.validation.images,\n",
    "        y: mnist.validation.labels,\n",
    "        keep_prob: 0.95,\n",
    "    }\n",
    "    test_data = {\n",
    "        x: mnist.test.images,\n",
    "        y: mnist.test.labels,\n",
    "        keep_prob: 1\n",
    "    }\n",
    "    \n",
    "    for i in range(training_step):\n",
    "        if i == 5000:\n",
    "            lr = 0.1\n",
    "        if i == 8000:\n",
    "            lr = 0.03       \n",
    "        if i == 9000:\n",
    "            lr = 0.01\n",
    "        #if i == 9500:\n",
    "        #    lr = 0.001\n",
    "\n",
    "        xs, ys = mnist.train.next_batch(batchsize)\n",
    "        _,loss = sess.run([optimizer, cross_entropy_loss],\n",
    "                         feed_dict={\n",
    "                             x: xs,\n",
    "                             y: ys,\n",
    "                             keep_prob: 0.75,\n",
    "                             learning_rate: lr\n",
    "                         })\n",
    "        \n",
    "        # 每100次训练打印一次损失值与验证准确率\n",
    "        if i >0 and i%100 == 0:\n",
    "            validate_accuracy = sess.run(accuracy, feed_dict=validate_data)\n",
    "            print(\"after %d training steps, the loss is %g, the validation accuracy is %g\" % (i, loss, validate_accuracy))\n",
    "            \n",
    "    print(\"training is finished!\")\n",
    "    \n",
    "    acc = sess.run(accuracy, feed_dict=test_data)\n",
    "    print(\"the test accuracy is \", acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*测试准确率：0.9847*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 正常的log输出 ，log中的模型准确率达到98%"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "测试样本准确率稳定维持在98％以上。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 如何修改隐层数量，修改后会起到什么样的效果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 修改隐层数量，就是增加神经网络的深度，根据前面神经元的个数以及本层设计的神经元个数确定权重与偏差项的大小(shape)。  \n",
    "2. 修改隐层数量也就是增加了神经网络的深度，达到的效果是使网络的抽象能力更强，学习到的特征更加复杂，结果的准确率越高。  \n",
    "3. 一般认为，增加隐层数可以降低网络误差（也有文献认为不一定能有效降低），提高精度，但也使网络复杂化，从而增加网络的训练时间和出现“过拟合”的倾向。一般地，靠增加隐层节点数来获得较低的误差，其训练效果要比增加隐层数更容易实现。    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. 如何修改神经元个数，起到了什么样的效果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 修改神经元个数，可以通过对网络结构中的 w 与 b 的 shape 进行修改。  \n",
    "2. 修改神经元个数相当于增加了权重参数的个数，适当的增加神经元个数可以提高网络的准确率和精度，但是过多的神经元也是导致过拟合的“元凶”。  \n",
    "3. 本题使用 dropout 技巧，因此每个隐层的节点个数不是固定的，也是为了避免过度学习造成的“过拟合”现象。  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 如何在模型中添加L1/L2正则化，正则化起什么作⽤。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 可在创建变量或图层时添加正则化程序：  \n",
    "(本题采用)l2_loss = tf.losses.get_regularization_loss()   \n",
    "或在定义损失时添加正则化项：    \n",
    "loss = ordinary_loss + tf.losses.get_regularization_loss()  \n",
    "2. 正则化的作用：防止神经网络过度学习而造成\"过拟合\"现象，可以限制权重矩阵 W 的数值, 可视为一种\"惩罚项\"。  \n",
    "3. 得到正则项后将其乘以一个超参数再与交叉熵损失相加构成最终的目标函数。  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. 使⽤不同的初始化⽅式对模型有什么影响。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 使用不同的初始化方式可以让训练之初的效果产生差异，比如让收敛速度更快。  \n",
    "2. 特别对于含有局部极小值的情况，通过不同的初始化方式，有可能更加逼近最小值。  \n",
    "3. 老师示范的程式码使用标准差为 0.10 的正态分布，经过测试，0.11，0.115，0.12 的效果都会比 0.10 更好些。    \n",
    "4. 老师课堂内提到, 如果选择 ReLu 做为激活函数的话，权重及偏差项采用 MSRA 进行初始化的效果往往更好，本习题正是这样的例子。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
